Remedies against the Vocabulary Gap in Information Retrieval

نویسنده

  • Christophe Van Gysel
چکیده

Search engines rely heavily on term-based approaches that represent queries and documents as bags of words. Text---a document or a query---is represented by a bag of its words that ignores grammar and word order, but retains word frequency counts. When presented with a search query, the engine then ranks documents according to their relevance scores by computing, among other things, the matching degrees between query and document terms. While term-based approaches are intuitive and effective in practice, they are based on the hypothesis that documents that exactly contain the query terms are highly relevant regardless of query semantics. Inversely, term-based approaches assume documents that do not contain query terms as irrelevant. However, it is known that a high matching degree at the term level does not necessarily mean high relevance and, vice versa, documents that match null query terms may still be relevant. Consequently, there exists a vocabulary gap between queries and documents that occurs when both use different words to describe the same concepts. It is the alleviation of the effect brought forward by this vocabulary gap that is the topic of this dissertation. More specifically, we propose (1) methods to formulate an effective query from complex textual structures and (2) latent vector space models that circumvent the vocabulary gap in information retrieval.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information Retrieval for Bridging Vocabulary Gap between Health Seekers and Providers

In this paper we describe how to bridge vocabulary gap between health seekers and providers using novel scheme. To code medical records by jointly using local mining and global mining. Local mining uses individual medical records to drive a conclusion about individual health map into the authenticated terminology. Global mining combines medical records of similar types and analysis it to drive ...

متن کامل

Factors Affecting Student's Scientific Information Retrieval based on Fuzzy Logic Method Compared to Traditional Method

Background and aim: The aim of this study was to identify the factors affecting on students' performance in information retrieval based on fuzzy logic method compared to traditional method. Materials and methods: This survey-descriptive study was performed using quantitative approach. The research population was 34 PhD students, and the researcher-made questionnaire was used. Data were analyzed...

متن کامل

Studying the Effect of Retrieval Direction during Reading on Productive and Receptive Knowledge of Vocabulary

Retrieval tasks provide learners with an opportunity to focus both on meaning and on form. There are four different retrieval directions. The present study aimed to identify the optimal direction of recall type retrievals during reading and to investigate the outcomes of each one. Forty-eight intermediate EFL learners took part in the study. One of the experimental groups was provided with the ...

متن کامل

Combining Image Context Information

Current techniques for content based image retrieval have known shortcomings that make it difficult to search for images based on their semantic content. This leads to the well-known semantic gap problem. To address this problem, we propose utilizing context information, which is available from multiple sources, such as that generated by the camera at image capture, sensor data, context sources...

متن کامل

Closing the Vocabulary Gap for Computing Text Similarity and Information Retrieval

This paper studies the integration of lexical semantic knowledge in two related semantic computing tasks: ad-hoc information retrieval and computing text similarity. For this purpose, we compare the performance of two algorithms: (i) using semantic relatedness, and (ii) using a conventional extended Boolean model [13] with additional query expansion. For the evaluation, we use two different tes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1711.06004  شماره 

صفحات  -

تاریخ انتشار 2017